feat: WENO5 sum-of-squares beta on uniform grids (cancellation-free smoothness indicators) by sbryngelson · Pull Request #1408 · MFlowCode/MFC

sbryngelson · 2026-05-07T21:07:14Z

Summary

On uniform Cartesian grids, replaces the WENO5 smoothness indicator (beta) computation with the Jiang & Shu (1996) sum-of-squares form where every term is a non-negative perfect square — no cancellation possible.
On non-uniform grids, falls back to the existing precomputed beta_coef form unchanged.
Grid uniformity is detected once at initialization (s_compute_weno_coefficients) by comparing each cell spacing against the mean; result stored in uniform_grid(3) and uploaded to the GPU.

Motivation

The standard precomputed form for the centered stencil is:

beta(1) = 4/3*dvd0² − 5/3*dvd0*dvd(−1) + 4/3*dvd(−1)²

The negative cross-term causes catastrophic cancellation when dvd0 ≈ dvd(−1) (smooth flow), losing several significant bits. The Jiang & Shu sum-of-squares form is algebraically equivalent on uniform grids but has no negative terms:

beta(0) = 13/12*(dvd1 − dvd0)²      + 1/4*(dvd1 − 3*dvd0)²
beta(1) = 13/12*(dvd0 − dvd(−1))²   + 1/4*(dvd0 + dvd(−1))²
beta(2) = 13/12*(dvd(−1) − dvd(−2))² + 1/4*(3*dvd(−1) − dvd(−2))²

Algebraic equivalence on uniform grids was verified by expanding and matching all six coefficients against the precomputed beta_coef values (e.g. 4/3, −5/3, 4/3 for the centered stencil).

Test plan

./mfc.sh precheck -j 8 passes
./mfc.sh build -t simulation -j 8 compiles cleanly
All 10 WENO5 tests pass (1D and 2D, including wenoz, mapped_weno, mp_weno, teno variants): 7789B55A B3E70A3A 48D5C130 9090FC4B 677DAA82 9EF5305B 0ACB1F16 E719C5F2 31742FC7 D6415F48

…ation On uniform Cartesian grids, compute WENO5 smoothness indicators using the Jiang & Shu (1996) sum-of-squares form: beta(0) = 13/12*(dvd1-dvd0)^2 + 1/4*(dvd1-3*dvd0)^2 beta(1) = 13/12*(dvd0-dvd(-1))^2 + 1/4*(dvd0+dvd(-1))^2 beta(2) = 13/12*(dvd(-1)-dvd(-2))^2 + 1/4*(3*dvd(-1)-dvd(-2))^2 Each term is a perfect square — no negative cross-terms, no cancellation. Algebraically equivalent to the precomputed beta_coef form on uniform grids. Non-uniform grids fall back to the existing precomputed coefficients. Uniformity is detected once at init in s_compute_weno_coefficients by comparing each cell spacing to the mean; result stored in uniform_grid(3) and uploaded to GPU.

github-actions · 2026-05-07T21:22:02Z

Claude Code Review

Head SHA: 7c9660d

Files changed:

1
src/simulation/m_weno.fpp

Findings

[Correctness] Uniform-grid beta formulas hardcode exactly 3 stencils — incomplete for WENO7, wrong for WENO3

The new if (uniform_grid(${WENO_DIR}$)) branch assigns exactly beta(0), beta(1), beta(2), which are the Jiang–Shu formulas for WENO5 (3 sub-stencils). The surrounding unchanged code uses do q = 0, weno_num_stencils to loop over all stencils, making the stencil count a runtime quantity.

WENO7 (weno_num_stencils = 3): the loop body accesses beta(3), which the uniform-grid path never assigns. Depending on how beta is declared this is either an out-of-bounds access or silent use of an uninitialized value, corrupting the nonlinear weights and producing wrong reconstruction.
WENO3 (weno_num_stencils = 1): beta(0) and beta(1) are computed with the WENO5 two-point-stencil formulas, which are mathematically incorrect for the one-point WENO3 stencils.

The pre-existing w(1:8) declaration (8 ideal weights = 4 stencils × 2 sides, consistent with WENO7) and the variable-bound loop do q = 0, weno_num_stencils are both present in the same diff context, making this a high-confidence finding.

The non-uniform fall-back path is unchanged and generalises correctly via the precomputed beta_coef_${XYZ}$ arrays, so a simple fix is to guard the uniform-grid branch on weno_num_stencils == 2 (WENO5 only), or extend it with the correct formulas for each supported order.

Relevant changed lines: src/simulation/m_weno.fpp hunk at line ~1069–1091 (new if (uniform_grid(...)) block) and surrounding unchanged context do q = 0, weno_num_stencils.

…cision The previous 1e-10 threshold was below machine epsilon for real(4) (~1.2e-7), causing uniform_grid to always be .false. in --single builds since FP-computed spacings deviate at ~1e-7 level. Using sqrt(epsilon(h0))*abs(h0) is precision-agnostic: ~1.5e-8 relative in double, ~3.5e-4 in single -- above FP noise in both modes.

…ormulation Algebraically equivalent change shifts FP outputs by ~1 ulp, amplified through nonlinear WENO weights to ~2.3% per-point relative on the most sensitive axisymmetric/acoustic-source tests. Verified via convergence study: ||SOS - master||_inf shrinks at >5x WENO5 truncation rate, both forms converging to the same physical limit. Affected tests: - 059D137F: 2D Hypoelasticity 1 Fluid Axisymmetric - 3E23CB5D: 2D Acoustic Source Transducer Array - 6FE484B5: 2D Axisymmetric HLL - 7A9F4EF4: 2D Example axisym_shockbubble - 7C2FF26F: 2D Acoustic Source Transducer Gaussian - 7E15602E: 2D Acoustic Source Transducer Sine - B2CBB9FE: 2D Acoustic Source Transducer Delay - B89B8C70: 2D Axisymmetric model_eqns=3 - DB670E50: 2D Axisymmetric model_eqns=2 Full local 568-test suite passes (44m, serial debug).

qodo-code-review · 2026-05-09T19:46:53Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

The low_mach test exercises HLLC's low-Mach correction at near-zero Mach, where (u_R - u_L) is fundamentally small. Verrou MCA sampling variance puts the max_dev distribution typically around 5e-8 with a tail occasionally hitting ~1e-7, making the previous 1e-7 threshold flaky. Loosen to 2e-7 to absorb sampling noise. After MFlowCode#1408 (WENO5 sum-of-squares beta) merges, the WENO contribution to the cancellation chain disappears on uniform grids and the threshold can be tightened back.

codecov · 2026-05-09T21:55:30Z

Codecov Report

❌ Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 64.96%. Comparing base (87f299f) to head (7c9660d).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_weno.fpp	92.30%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1408      +/-   ##
==========================================
+ Coverage   64.94%   64.96%   +0.01%     
==========================================
  Files          72       72              
  Lines       18861    18871      +10     
  Branches     1570     1571       +1     
==========================================
+ Hits        12250    12259       +9     
  Misses       5638     5638              
- Partials      973      974       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…or fix (#1403) * feat: add fp-stability command for Verrou-based FP instability testing Adds ./mfc.sh fp-stability — a persistent floating-point stability test suite using Verrou's random IEEE-754 rounding mode. For each registered test case the runner: 1. Generates initial conditions via pre_process 2. Runs simulation once with --rounding-mode=nearest (reference) 3. Runs simulation N times with --rounding-mode=random 4. Reports max L-inf deviation vs threshold (PASS/FAIL) Two cases probe known ill-conditioning in MFC: - sod_strong: 1-D Sod p_L/p_R=100,000 — HLLC xi-factor cancellation (s_L - vel_L)/(s_L - s_S) near sonic contact - water_stiffened: 1-D water shock pi_inf=4046 — pressure recovery p=(E-pi_inf)/gamma loses ~4 decimal digits on low-pressure side Requires a Verrou-enabled Valgrind at $VERROU_HOME/bin/valgrind (default: $HOME/.local/verrou). Silently skips if not found. Binaries are auto-discovered from build/install/ or passed explicitly. * ci: add weekly FP stability workflow using Verrou * ci: trigger fp-stability on every push * ci: remove concurrency group from fp-stability * style: collapse multi-line raise ValueError to satisfy CI ruff formatter * feat: add sod_standard baseline, enlarge cases, run dd_sym on failure - Add sod_standard (p_L/p_R=10, well-conditioned, threshold=1e-13) as a canary that should always pass under any FP rounding - Enlarge sod_strong and water_stiffened from 25→50 cells and 5→10 steps to give Verrou more arithmetic to perturb and raise sensitivity - Run verrou_dd_sym automatically on any failing case: generates dd_run.sh and dd_cmp.py per case, sets PYTHONPATH for verrou Python libs, saves rddmin_summary and full logs to fp-stability-logs/ (uploaded as CI artifact) - Add python3-numpy to CI apt-get and upload-artifact step (if: always()) * feat: add float proxy, VPREC sweep, dd_line, air_water_interface case - Add four new test cases total: sod_standard, sod_strong, water_stiffened, air_water_interface (two-fluid isobaric contact, stiffened EOS) - Feature B: float-proxy run (--rounding-mode=float) measures single-precision sensitivity without recompiling; skippable with --no-float-proxy - Feature C: VPREC mantissa sweep at [52,23,16,10] bits shows precision floor curve for each case; skippable with --no-vprec - Feature D: verrou_dd_sym symbol-level delta-debug on failure; --no-dd-sym - Feature E: verrou_dd_line source-line delta-debug on failure; --no-dd-line - Add --no-float-proxy/--no-vprec/--no-dd-sym/--no-dd-line CLI flags - Remove dead _max_diff() function (superseded by _max_diff_np()) - Update FP_STABILITY_COMMAND description to document all 4 cases and features * fix: add missing air_water_interface case input files * fix: add alpha(2) for both patches in air_water_interface pre_process.inp * fix: add reduced-energy (E-tilde) scheme for model_eqns=2 FP stability Store Ẽ = E - pi_inf_mix (reduced energy) in the conserved energy slot for the Allaire 5-equation model (model_eqns=2). This eliminates catastrophic cancellation when recovering pressure via p = (Ẽ - KE - qv)/gamma vs the old p = (E - pi_inf - KE - qv)/gamma where pi_inf >> p (e.g. water: pi_inf=4046 >> p=1). Key changes: - m_variables_conversion.fpp: scope Ẽ storage to model_eqns=2 only; add explicit model_eqns=1/3 branch (physical E) to avoid fallthrough to Tait EOS for model_eqns=3 (Saurel 6-equation) - m_riemann_solvers.fpp: HLL and LF non-MHD blocks use Ẽ as states with pi_inf added back for enthalpy H; HLLC Block 1 (model_eqns=3) and Block 4 (model_eqns=2) retain their existing correct handling; xi-factor denominators protected with sgm_eps (Fix 1) - m_rhs.fpp: add non-conservative Ẽ source S_Ẽ = -sum_i(pi_inf_i * rhs(alpha_i)) for x/y/z directions, gated on model_eqns==2 and .not.(bubbles_euler .or. mhd) - m_sim_helpers.fpp: igr branch drops pi_inf from pressure recovery (Ẽ stored) and restores it for physical H - m_cbc.fpp: energy flux drops dpi_inf_dt term (Ẽ not pi_inf in flux) * ci: increase fp-stability case step counts to 50 for better sweep coverage * revert: remove reduced-energy (E-tilde) scheme — moving to separate PR * fix: protect HLLC xi-factor denominators from division by zero with sgm_eps * ci: loosen water_stiffened threshold to 1e-8 until Etilde scheme merges The stiffened-EOS pressure recovery p=(E-pi_inf)/gamma suffers catastrophic cancellation (~2.5e-9 max_dev under random rounding vs the previous 1e-10 gate). The algorithmic fix (reduced-energy / Etilde storage scheme) lives on feat/reduced-energy. Until that PR merges, raise the gate to 1e-8 so CI is green while still catching regressions. * ci: GitHub step summary, file annotations, and float-proxy metric in fp-stability On every run, fp_stability.py now writes to GITHUB_STEP_SUMMARY a markdown table with pass/fail, max_dev, float proxy, and the full VPREC precision sweep showing which bit levels (52/23/16/10) fail. On failure, dd_line source locations are emitted as ::warning:: annotations so the responsible Fortran lines appear highlighted directly in the PR diff view. Both are no-ops outside GitHub Actions (env var guards). * ci: remove pass/fail markers from VPREC sweep table The VPREC sweep is a sensitivity curve, not a pass/fail test. Comparing reduced-precision runs against the double-precision threshold marks every case as failing at 23b/16b/10b, which is noise. Show raw dev numbers only; the main table has the actual pass/fail. * fix: move cons.unindent into finally block; use MFC_ROOT_DIR in _find_binary - cons.unindent()/cons.print() were after the try/finally, so any MFCException raised inside _run_case would leave the console permanently over-indented for all subsequent case output. - _find_binary used os.getcwd() which breaks if fp-stability is invoked from a subdirectory; MFC_ROOT_DIR is always correct. * ci: always run dd_line to surface FP hotspots on every run Previously dd_sym/dd_line only ran on test failure, so passing runs gave no source-level instability info. Now dd_sym and dd_line always run, using a sensitivity threshold of max_dev/10 rather than the pass/fail threshold. This isolates the source lines responsible for the dominant FP variation even when the case passes. Results are capped at top 10 locations per case. The GitHub step summary now always shows a 'Top FP hotspots (dd_line)' section, and ::warning:: annotations are emitted for all cases (labelled 'hotspot' on pass, 'FAIL' on failure) so the sensitive Fortran lines are visible in the PR diff regardless of CI outcome. * ci: add bubble_rp and low_mach FP cases; fix dd threshold to use median * fix: bubble_rp pre_process.inp (remove bubble_model); loosen low_mach threshold to 1e-7 * fix: loosen bubble_rp threshold to 1e-8 (RP ODE cancellation observed ~2e-9) * ci: use float-mode dd bisection (deterministic, nruns=1) instead of random * fix: dd_run.sh must honour VERROU_ROUNDING_MODE=nearest for reference run verrou_dd_sym sets VERROU_ROUNDING_MODE=nearest when producing its reference run, then leaves it unset for test runs. Hardcoding --rounding-mode=float as a CLI arg overrides that env var (CLI takes precedence in Valgrind), so both reference AND full-perturbation test end up in float mode, give identical output, deviation=0, and dd exits 42. Fix: use ${VERROU_ROUNDING_MODE:-float} in dd_run.sh so verrou_dd_sym's nearest-rounding reference is preserved, while test steps still default to float mode (deterministic, --nruns=1 sufficient). * feat: add cancellation detection, MCA sig-bits, and float-max checks to fp-stability Three new Verrou analysis passes, each enabled by default with --no-X to skip: F. --check-cancellation=yes (--no-cancellation): uses --cc-gen-file for structured per-line output of catastrophic cancellation sites in MFC source. G. --backend=mcaquad MCA runs (--no-mca): N samples with Monte Carlo Arithmetic; reports max deviation and a significant-bits lower bound: s = -log2(dev/scale). H. --check-max-float=yes (--no-float-max): detects double→float conversions that would overflow to ±Inf; reports source locations from Valgrind error log. Results added to GitHub step summary (cancellation sites table, float-max sites table, MCA sig-bits column in the main results table). * fix: emit up to 3 dd_line + 3 cancellation annotations per case Previously only dd_line locations were annotated (minimal delta-debug set, often just 1 line). Now also emit the top 3 cancellation sites per case as ::notice:: so the PR diff shows a richer set of FP hotspots. * fix: source context in summary, branch filter on workflow, reviewer feedback - Add _get_source_context() to embed 5-line annotated snippets in the GitHub step summary for each dd_line hotspot and cancellation site - Fix workflow on: push: to only fire on master and fp-stability branches, plus add weekly cron schedule (Monday 03:00 UTC) - Fix log_dir to use MFC_ROOT_DIR instead of os.getcwd() - Remove unreachable 'or 5' fallback on ARG('samples') (default=5 in CLI) * fix: eliminate xi_L/R - 1 cancellation in 5-eq HLLC solver In the 5-equation HLLC block, xi_L ≈ 1 near the sonic contact point, so computing (xi_L - 1) by post-hoc subtraction from 1 loses significant bits. Add xi_L_m1 = (s_S - vel_L)/(s_L - s_S), computed directly instead of as xi_L - 1, and use it throughout the mass, momentum, energy, volume-fraction, color-function, and species flux loops. The momentum flux also uses the algebraic identity xi*(dir_flg*s_S + (1-dir_flg)*u_i) - u_i = (dir_flg*s_L + (1-dir_flg)*u_i)*xi_m1 which avoids both the xi - 1 and the xi*s_S - u (normal direction) forms. For the energy flux the star-state term xi_L*(E + expr) - E is rewritten as E*xi_L_m1 + xi_L*expr, removing the E*(xi_L - 1) cancellation. All 162 1D regression tests pass; all 6 FP-stability suite cases pass. * Update fp-stability.yml * fix: loosen low_mach FP-stability threshold to 2e-7 The low_mach test exercises HLLC's low-Mach correction at near-zero Mach, where (u_R - u_L) is fundamentally small. Verrou MCA sampling variance puts the max_dev distribution typically around 5e-8 with a tail occasionally hitting ~1e-7, making the previous 1e-7 threshold flaky. Loosen to 2e-7 to absorb sampling noise. After #1408 (WENO5 sum-of-squares beta) merges, the WENO contribution to the cancellation chain disappears on uniform grids and the threshold can be tightened back. * fix: extend xi_L_m1/xi_R_m1 cancellation fix to all four HLLC blocks Blocks 1 (model_eqns==3), 2 (model_eqns==4), and 3 (model_eqns==2 with bubbles_euler) still computed xi_L/R - 1 by subtraction, losing bits near a sonic contact where xi_L/R approx 1. Add xi_L_m1/xi_R_m1 = (s_S - u_L/R)/(s_L/R - s_S) to all three blocks and replace xi_L - 1 in mass, advection, bubble, and vel_src flux loops. * refactor(fp-stability): replace committed .inp files with Python dicts in fp_stability.py * refactor(fp-stability): unify CASES + params into one list, add _merge helper

sbryngelson added 5 commits May 7, 2026 18:52

Merge branch 'master' into weno-beta-sos

63e7833

Merge branch 'master' into weno-beta-sos

5bb6a34

Merge branch 'master' into weno-beta-sos

7c9660d

sbryngelson marked this pull request as ready for review May 9, 2026 19:46

sbryngelson merged commit 42f003e into MFlowCode:master May 10, 2026
133 of 136 checks passed

sbryngelson deleted the weno-beta-sos branch May 10, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: WENO5 sum-of-squares beta on uniform grids (cancellation-free smoothness indicators)#1408

feat: WENO5 sum-of-squares beta on uniform grids (cancellation-free smoothness indicators)#1408
sbryngelson merged 6 commits intoMFlowCode:masterfrom
sbryngelson:weno-beta-sos

sbryngelson commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented May 9, 2026

Uh oh!

codecov Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

sbryngelson commented May 7, 2026

Summary

Motivation

Test plan

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Findings

Uh oh!

qodo-code-review Bot commented May 9, 2026

Uh oh!

codecov Bot commented May 9, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 7, 2026 •

edited

Loading